SCAV tools Wiki
__TOC__ Welcome to the SCAV Tools Wiki We created this wiki for a better understanding of SCAV Tools usage and theory behind it. What is SCAV Tools? SCAV Tools is a Python based interface to encode an audio file using different approaches. It is intended to be used for educational purpouses. It is open-source in order to motivate anyone to modify the functions as much as they please and easily see some evaluation of the changes they have made. It has been developed by Joan Barceló and Joan Cañellas, as part of the final project of the subject "Audio and Voice Codification Systems" (Sistemes de Codificació d'Audio i Veu in Catalan, abreviated as SCAV) which is taught in the Universitat Pompeu Fabra (UPF) in Barcelona as part of the mandatory subjects for the "Audiovisual Systems Engineering" degree. Most of the code is translated from the MATLAB code assignments asked to deliver during the course, which was partially based on skeleton code which we had to fill. Other parts of the code (interface code and some utilities) are adapted from the sms-tools package developed by the Music Technology Group of the UPF. User Guide Installation In order to use these tools you have to install version 2.7.* of Python and the following modules: numpy, matplotlib, scipy and bitstring. Unix based operative systems (Ubuntu, Mac OS X...) already have Python installed, but not the modules: In Ubuntu (strongly recommended) in order to install all these modules it is as simple as typing in the Terminal: $ sudo apt-get install python-dev python-numpy python-matplotlib python-scipy bitstring In OS X you install these modules by typing in the Terminal: $ pip install numpy matplotlib scipy bitstring Although we do not recommend using Windows for Python (there is no support from the developers), it can be dowloaded here. To install the modules though the Python terminal for Windows: cd C:/Python/Scripts/ pip.exe install Usage This application is developed in Python. Therefore to execute it follow the steps: # Access to the models_interface directory using the terminal (e.g. $'' ''cd scav-tools/software/models_interface/) # Execute the interface of the application using the following command: $ python models_GUI.py The interface has different tabs to access to different functions that are listed below. Uniform Quantizer In this tab we can quantize a PCM sound using the number of bits that we choose. Firs of all choose a mono .wav file, it can be played. Then choose the number of bits and finally press the quantize button. Then the output file can be played. Apart of the coded file and the decoded .wav file, this app shows the plot of the wave form and the magnitude spectrogram of the input and output sound, the plot of the SNR, the bitrate of the of the coded file (in kbps), the Compression Ratio, the mean of SNR and the time of execution. DFT Coder In this tab we can encode a PCM sound using using a DFT coder. Firs of all choose a mono .wav file, it can be played. Then choose the number of bands, the window size and the number of bits per band, and finally press the quantize button. Then the output file can be played. Apart of the coded file and the decoded .wav file, this app shows the plot of the wave form and the magnitude spectrogram of the input and output sound, the plot of the SNR, the bitrate of the of the coded file (in kbps), the Compression Ratio, the mean of SNR and the time of execution. MDCT Coder In this tab we can encode a PCM sound using using a MDCT coder. Firs of all choose a mono .wav file, it can be played. Then choose the number of bands, the window size and the number of bits per band, and finally press the quantize button. Then the output file can be played. Apart of the coded file and the decoded .wav file, this app shows the plot of the wave form and the magnitude spectrogram of the input and output sound, the plot of the SNR, the bitrate of the of the coded file (in kbps), the Compression Ratio, the mean of SNR and the time of execution. Bands Coder In this tab we can encode a PCM sound using using a band coder. Firs of all choose a mono .wav file, it can be played. Then choose thebitrate and the window size, and finally press the quantize button. Then the output file can be played. Apart of the coded file and the decoded .wav file, this app shows the plot of the wave form and the magnitude spectrogram of the input and output sound, the plot of the SNR, the bitrate of the of the coded file (in kbps), the Compression Ratio, the mean of SNR and the time of execution. Perceptual Coder In this tab we can encode a PCM sound using using a band coder. Firs of all choose a mono .wav file, it can be played. Then choose the bitrate and the window size, and finally press the quantize button. Then the output file can be played. Apart of the coded file and the decoded .wav file, this app shows the plot of the wave form and the magnitude spectrogram of the input and output sound, the plot of the SNR, the bitrate of the of the coded file (in kbps), the Compression Ratio, the mean of SNR and the time of execution. Decoder In this tab we can decode any type of the coded files and obtain a decoded .wav file. Firs of all choose a coded file, then press the Decode button. Finally the output file can be played. The app remember where we can find the output file. Functions Description BARK This function convert a lineal frequency specified in hertz to bark scale. FTTBARK This function convert the fft bin to bark scale, it uses the fft length and the sampling frequency. This function uses the function BARK. SCHROEDER This function calculates the masking spectrum for a given frequency and SPL with Schroeder function. 10log_{10} F(dz) = 15.81 + 75(dz + 0.474) - 75(1+(dz + 0.474)^2)^{1/2} It uses the sampling frequency, the frame length, the frequency peak location in hertz and the difference between the peak and the mask levels in dB. This function uses the function BARK. MIDTREAD_QUANTIZER This function quantize a signal in number of levels that we assign. it uses the signal and the number of bits that we want to use. MIDTREAD_DEQUANTIZER This function dequantize a signal quantized in number of levels that we assign. it uses the signal and the number of bits that we want to use. MDCT This function returns the Discrete Cosine Transform of a signal. It needs a vector of a signal. IMDCT This function returns the Inverse of Discrete Cosine Transform of a signal. It needs a vector of Discrete Cosine Transform of a signal. P_ENCODE This function quantizes the MDCT of a frame with the bits indicated at the 'bit_alloc'. It uses the MDCT of a frame, the sampling frequency, the frame length and bit_alloc (an array containing the bits designed for each band. It returns the quantized gain and the quantized MDCT. ENFRAME This function split a signal in to frames. We can choose a type of window to enframe each frame and the overlapping. ALLOCATE This function allocates bits to each band given a certain bitrate (from which a bitpool will be generated). It is based on the SPL 'y' which for the perceptual model is the SMR. For Bands is just the signal SPL. Conclussions Category:Browse